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Linear Depth Stabilizer and Quantum Fourier Transformation Circuits with 
no Auxiliary Qubits in Finite Neighbor Quantum Architectures 
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In this paper we investigate how quantum architectures affect the efficiency of the execution of the Quantum 
Fourier Transform (QFT) and linear transformations, which are essential parts of the stabilizer/Clifford group 
circuits. In particular, we show that in most common and realistic physical architectures including Linear 
Nearest Neighbor (LNN), 2D lattice, and bounded degree graph (containing a chain of length n), «-qubit QFT 
and n-qubit stabilizer circuits can be parallelized to linear depth using no auxiliary qubits. We construct lower 
bounds that show the efficiency of our approach. 
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I. INTRODUCTION 

Quantum computation has attracted attention because it ap- 
pears to reduce the computational complexity of certain calcu- 
lations, see, for example, tl|,|2|]. For the quantum circuit model 
of computation, there exists a number of physical quantum in- 
formation processing implementations, such as liquid NMR 
(up to 12 qubits at a time) [3], and trapped ions (8 qubits) |4]. 
Generally, a large number of qubits is required for computa- 
tional purposes. In this work, we do not allow for any auxil- 
iary qubits to be used in order to reflect the apparent hardness 
of scaling up quantum information processing devices. 

Quantum circuits have been optimized to require less space, 
fewer gates and smaller depth. This is important from the 
point of view of the efficient potential realization of the quan- 
tum algorithms. As discussed in the first paragraph, we ad- 
dress the issue of space minimization by restricting the num- 
ber of auxiliary qubits to zero. Our next focus is on depth 
minimization. This is because a small depth circuit does not 
only mean a fast computation, but also helps reduce the effect 
of decoherence. For instance, it is possible to construct re- 
alistic examples in which a smaller depth circuit will require 
fewer levels of error correction, and each error correction code 
concatenation step is a very expensive operation [5]. 

In this paper, the depth of a circuit is defined as the number 
of logic levels in it. Each logic level is a set of non-intersecting 
"elementary" gates. It is generally accepted that in a prac- 
tical quantum information processing approach, it should be 
possible to execute independent gates in parallel. The gate 
libraries considered in the relevant literature include a set of 
single-qubit and CNOT gates (which is most likely an arti- 
fact of the well known result showing the completeness of 
this gate set, however, CNOT may not necessarily be a natu- 
ral gate for some quantum information processing proposals), 
and any two-qubit operation. Indeed, given a Hamiltonian, 
any two-qubit operation can be efficiently implemented 
For the sake of completeness, this paper discusses how the re- 
sults apply to both gate sets. Circuit depth, as defined above, 
upper bounds a possibly lower circuit runtime in cases when 
next logic level can be executed based upon the availability 



of qubits and before execution of the gates from the previous 
level has completed. Practically, this means that some of our 
upper bounds may not be tight (which is advantageous in the 
sense that the implementation we construct may in fact have 
a smaller runtime than predicted by the formulas evaluating 
depth). 

Quantum algorithms and their circuits are usually formu- 
lated without considering the physical limitations imposed by 
different architectures. We believe that circuit and algorithm 
designs need to be modified to account for possible architec- 
tures. In particular, in realistic architectures, it is not possible 
to establish direct interactions between every pair of qubits 
111 S 0]. A study of quantum computing architectures for 
the existing and emerging quantum technologies shows that 
the fastest possible direct interactions form a bounded degree 
graph (e.g., liquid NMR quantum information processing), 
and ID or 2D (sub)lattices [8]. A mixed architecture, where 
values of stationary qubits may be teleported with the help of 
flying qubits to where they are desired was studied in J^]. In 
this work, the role of stationary qubits is played by the spins 
of phosphorus atoms embedded in silicon, known as the Kane 
proposal IToll . and the flying qubits are photons, with the in- 
formation being teleported via EPR states. Other proposals for 
state transfer between either stationary or both flying and sta- 
tionary qubits, and discussions of mixed architectures, can be 
found, for example, in ITll[T2l[T3Tl and the references therein. 
However, an architecture that allows interconversion between 
stationary and flying qubits cannot in general be realized in 
any technology. In addition, it was shown that teleportation 
of a single value (simultaneous teleportation of many qubits 
may be less efficient) in the Kane architecture is only efficient 
if compared to more than 2-4 levels of SWAPs [9]. A similar 
effect is likely to take place in other mixed architecture pro- 
posals. The latter is important for this work since we are only 
using depth- 1 swapping of multiple qubits via SWAP gates. 

Generally speaking, due to the spatial constraints it seems 
unrealistic to believe that a direct scalable implementation of 
the unrestricted (where every two qubits are neighbors) archi- 
tecture, or, more generally, unbounded neighbor architecture, 
will ever be found. Furthermore, in classical computation the 
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number of neighbors is limited, and there is no obvious rea- 
son to believe that the quantum world is different. Thus, the 
complexity of the circuit designs must be refined to take it into 
account the limitations of possible quantum computing archi- 
tectures. 

The linear nearest neighbor (LNN) architecture, also known 
as chain nearest neighbor, is often considered as a good (and, 
in fact, very restrictive) approximation to what a scalable 
quantum architecture may be. Mathematically, in an LNN 
architecture with n qubits qi, qi-,... ,q n , two-qubit gates are 
allowed between any qubits whose subscript values differ by 
one. The LNN architecture describes ID lattices. It misses 
possible direct interactions in 2D lattices and may restrict the 
number of useful interactions in connected graphs. However, 
if one can show that a circuit can be efficiently reorganized to 
be executed in the LNN architecture, such a circuit could be 
run efficiently in many other architectures. 

The Quantum Fourier Transformation (QFT) is an analogue 
of the classical discrete Fourier transformation, however, in 
the quantum case the transformation is applied to the ampli- 
tudes. The QFT serves as a basis for a number of efficient 
quantum algorithms. Most notably, it is at the heart of inte- 
ger factorization and the discrete logarithm polynomial time 
quantum algorithms [2]. Therefore, efficient implementation 
of the QFT is important. This is why this topic has been stud- 
ied extensively 13 EH Gil- Researchers presented linear and 
logarithmic depth circuits using a number of auxiliary qubits. 
Known circuits for the QFT have a regular structure [l6ll . 
However, they require direct interaction between every two 
qubits, which makes such circuits especially inconvenient for 
quantum architectures where only a finite number of neigh- 
bors is allowed. In an architecture with a finite number of 
neighbors, such as LNN, state transfer down the chain may 
require up to (n — 1) SWAP gates. We refer to the this obser- 
vation as the locality constraint in the discussions involving 
lower bound arguments. A linear depth QFT circuit imple- 
mented in the LNN architecture has been reported in lfl7l1 . We 
reconstruct this circuit with our generalized technique and we 
also study lower bounds. 

Stabilizer circuits (also known as unitary stabilizer circuits 
or Clifford group circuits) were introduced and studied for 
their use in the encoding, decoding and error detection stages 
of quantum error-correction codes u&L \l% . They can be 
defined as arbitrary quantum circuits composed with single- 
qubit Hadamard and Phase gates and two-qubit controlled- 
NOT gates. It turns out that stabilizer circuits can be effi- 
ciently simulated 12011 as an 11 -stage sequence of Hadamard 
(H), Phase (P) and linear reversible circuits (C) as H-C-P-C- 
P-C-H-P-C-P-C. Each P and H stage is a depth- 1 computation 
composed with single-qubit gates. The depth of stabilizer cir- 
cuits is, thus, defined by the depth of a circuit realizing some 
linear reversible function. Efficient circuits for linear func- 
tions are, therefore, of great importance. In this paper we 
show that every stage C can be parallelized to linear depth 
in the LNN architecture. Thus, the entire stabilizer circuit re- 
quires at most linear time to be executed. 

A very recent study shows that a size s stabilizer circuit 
in an unrestricted architecture can be parallelized to a depth 



O(logn) circuit, but requires 0(s 3 +n) auxil iary qu bits l2lll . 
Proposition 8.9. Combining the results of ll20l |2U I22I1 this 
gives a depth (9(log«) circuit in unrestricted architectures us- 
ing Q( lo g3 n ) auxiliary qubits to realize any stabilizer circuit. 
Since a depth d circuit built on q qubits in unrestricted archi- 
tectures may become as large as depth 0(qd) in the LNN ar- 
chitecture (every depth-2 computation can be adversary made 
to define the complete interaction pattern of the LNN archi- 
tecture, and two depth-2 non-commuting stages can be de- 
fined such as to require a linear depth qubit permutation be- 
tween them), the benefit of logarithmic depth quickly disap- 
pears. However, a large amount of auxiliary qubits remains. 
Out approach thus appears more practical. 

The remainder of the paper is organized as follows. 
We start by introducing a concept of skeleton circuits and 
studying their properties. In Subsections III Al and MB I the 
lessons learned are applied to show that QFT and linear re- 
versible/stabilizer circuits can be parallelized to linear depth 
in the LNN architecture. Section|lII]reports lower bounds for 
a class of skeleton circuits which appears to be very important. 
Concluding remarks can be found in SectionHVl 



II. SKELETON CIRCUITS 

Any quantum circuit composed with single-qubit and two- 
qubit gates can be thought of as a circuit composed of generic 
two-qubit operations each of which consists of a two-qubit 
gate of the initial circuit with the surrounding gates absorbed 
into it (the trivial case when only single-qubit gates are applied 
to a specific qubit throughout an entire computation is ignored 
as not interesting). We call this a skeleton circuit. Obviously, 
the complexity of a skeleton circuit defines the complexity 
of the initial circuit (assuming that any two-qubit gate has a 
finite cost) and vice versa. We next study skeleton circuits of a 
certain type and apply the lessons learned to construct circuits 
for QFT and linear reversible/stabilizer circuits of linear depth 
in the LNN architecture. 

The basic skeleton circuit we consider is illustrated in Fig. 
[TJa). Mathematically, the skeleton circuit SC is defined as 

SC := G'l (qi,q 2 )G 2 2 (qi,q 3 ).., G l *Z\ (<7l , q n ) 

G^q 2 ,q 3 )...G'^ 2 /2 (q^uqn), (D 

where G* (* is reserved to represent any possible existing 
value of subscript) is a two-qubit gate that operates on the 
qubits indicated in brackets, z'» take Boolean values, and for a 
gate G, G 1 is the gate G itself, whereas G° = Id (identity, i.e., 
this gate is not applied). In other words, z* are used to indicate 
whether a gate is present or not. 

Since all quantum gates that operate on non-intersecting 
sets of qubits commute, the SC circuit can be executed in par- 
allel in (2n — 3) computational stages L\,Iji,... ,L 2n -3 defined 
as follows: L x := G\ [ ,L 2 := G 2 2 , L3 := GfGfr, U := G%G%+\, 

L 5 := G^G^G^ 2 , . . . , L 2 „_ 3 := C%£§ r This is illus- 
trated in Fig. [TJb) in the case n = 5. 
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■ 5. (a) Original circuit with at most ^ gates. Each of the gates in this 



FIG. 1: Reorganizing an n-qubit skeleton circuit, illustrated for n = j. ya.) wiigmai uituii wim ai musi — 3- 
skeleton circuit may or may not be present, (b) Linear (2n — 3) depth circuit possible to run in the "sea-of-qubits" architecture, (c) Version 
of (b) ready for execution in the LNN architecture, (d) This table illustrates how swapping stages S* are constructed and inserted between the 
computational stages L*. 



Next, the circuit can be adapted to the LNN architecture 
through inserting SWAP gates SWAP^,^) after each gate 
Gf(q s ,q t ), This is illustrated in Fig. Qlc) and (d) in the case 
n = 5. In the gate library containing all possible 2-qubit uni- 
taries, the upper bound for depth is (2n — 3). We next use this 
result to achieve linear depth circuits for QFT and stabilizer 
circuits. These are fairly tight upper bounds. With the best 
known asymptotic result requiring @(n 2 ) gates for the QFT, it 
can be shown that QFT cannot be computed in less than linear 
depth even in an unrestricted architecture. A counting argu- 
ment applied to linear circuits [22] shows that there exists a 

2 

stabilizer circuit that requires at least gates, meaning 

that it is impossible to find a circuit for it with depth less than 
0( 
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! even if the architecture is unrestricted. Lower bounds 



in restricted architectures (all of which turn out to be linear, 
and thus having the same asymptotic as the upper bound that 
follows from our construction) are studied in Section Hill 

Let us note that the skeleton circuit that we consider can 
be parallelized to linear depth in the LNN architecture for any 
initial permutation of the input and return the output in any 
desired order. For that, at most a linear depth swapping stage 
before and after the circuit is required, which does not change 
the overall linearity of the depth. The circuit illustrated in 
Fig. Q2c) not only allows execution in the LNN architecture, it 
also does not change the LNN connectivity pattern (q\ —qi — 
... — q„), and thus such circuits can be applied one after the 
other with no swapping in between. This observation will be 
used in Subsection III Bl If the circuit in Fig. \V[c) is the last 
computational stage before the measurement is done, the last 
SWAP need not be applied. 
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FIG. 2: (a) Circuit for ra-qubit QFT @], page 219, illustrated for 
n = 6. The two-qubit gates are controlled-Z rotations with parameter 
1/2*, where k is the subscript in the gate notation. The single-qubit 
gates are Hadamard gates, (b) Skeleton circuit of the QFT circuit in 
(a) composed of generic two-qubit gates. 



|2jb)) is obviously of the type considered in the previous sec- 
tion with all 2* = 1 . Therefore, the QFT can be parallelized to 
linear depth. This is, however, a known result, as lfl7ll reports a 
construction that is equivalent to ours. It can also be observed 
that the approximate QFT circuit, where controlled rotations 
of the QFT circuit with small parameters are ignored, may 
be executed in linear depth in the LNN architecture. Lower 
bounds are discussed in SectionUm and they apply directly to 
the QFT circuit. 



A. QFT in the LNN architecture 



B. Stabilizer/linear circuits 



A circuit that realizes the QFT and requires no ancilla 
qubits is illustrated in Fig. |2ja). Its skeleton circuit (Fig. 



Synthesis of efficient linear circuits has been studied in 
dlH. The authors report a synthesis algorithm capable of pro- 
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ducing a circuit with 0(^^) CNOT gates. It was also proven 
that their synthesis is asymptotically optimal in that there ex- 
ists a linear function that requires ®(]^) CNOT gates. In 
this paper, the goal is different. We target minimization of the 
depth as opposed to the number of gates used. The depth of 
our circuit is linear in the number of qubits n, and it is up- 
per bounded by 18« + 0(1) CNOTs (assuming every SWAP 
is substituted with a suitable 3-CNOT implementation) or 
6« + 0(1) generic two-qubit gates. We also prove asymptotic 
optimality, which in our case is straightforward. 

Every reversible linear function of n variables q — 
(qi,q2, ■ ■ ■ ,q„)' can be written as matrix multiplication Aq, 
where A is an n x n Boolean non-singular matrix. Synthesiz- 
ing such a function is equivalent to composing a sequence of 
gate operations that transforms matrix A into its reduced ech- 
elon form. Due to reversibility, the reduced echelon form of A 
is the identity matrix. A standard technique for transforming a 
matrix A to the identity is to apply the Gauss-Jordan elimina- 
tion algorithm. In the following, we illustrate the application 
of the Gauss-Jordan elimination algorithm and then modify its 
circuit to allow it be executed with a linear number of compu- 
tational stages. Parameters z* and p t take Boolean values and 
they are used to indicate whether the gate has been applied (1) 
or not (0). Parameters p* are reserved for the gates applied to 
update values of the diagonal elements of the matrix A during 
Gauss-Jordan elimination. 

• Step 1. Make sure that the pivot element a\ t \ ^ 0. If 
fli i 7^ assign p\ := 0. Otherwise choose aj\ ^ 0, 
apply gate CNOT(g ; ,gi) and make assignment pi := 1. 

• Steps s — 2..n. Transform each a s \ to through appli- 
cation (if needed) of the gate CN(JT(qi,q s ). If at step s 
a gate was applied set i s := 1, otherwise, i s := 0. 

• Step n + 1. Make sure that the pivot element 02,2 7^ 0. If 
«2,2 7^ do nothing (p2 := 0), otherwise choose aj^ 7^ 0, 
apply gate CNOT ((7^,172) and set p%:= 1. 

• Steps s = (n + 2)..(2« — 1). Transform each a s ,2 
to through application (if needed) of the gate 
CNOT(<72,<?i-«+i)- If at step s a gate was applied set 
i s := 1, otherwise, i s : = 0. 



. step - 2. Make sure that the pivot element 

^ 0. If fl„-i,„-i ^ do nothing (p„_i := 0), 
otherwise apply gate CNOT(q„,q n -i) and make assign- 
ment p„-\ := 1. After this step, all parameters p t must 
be set. 

• Step "("+') — l. Transform each a„.„_i to through 
application (if needed) of the gate CNOT(^„_i,^„). 

If the gate was applied set i n (n+i) '■= 1, otherwise, 

— 2 

inin+i) '■= 0. At this point, the set of applied transfor- 
1 

mations reduced matrix A to the upper triangular form 
with ones on diagonal. The remainder of the algorithm 
eliminates non-zero elements above the diagonal. 
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FIG. 3: Application of Gauss-Jordan elimination algorithm to the 
synthesis of a reversible network. Gates with controls (7) indicate 
a single CNOT each with the control at (exactly) one of positions 
marked (?) . 



• Steps s = ^tli..(n 2 - 1). If a kJ =f 0, apply 
CNOT(<7/,<7jt) for k — I.. I inside for / = n..2 and set i s 
to one iff a gate has been applied. 

We next use the gate commutation rule (two CNOT gates 
commute iff target of one gate is not equal to the control 
of the other) and circuit identity CNOT(a,c)CNOT(c,b) = 
CNOT(c,fc)CNOT(a,£)CNOT(a,c) to move all (n - 1) gates 
CNOT(a,c) with parameter p* to the front of the network. 
Note, that every time commutation rule is used, the gates 
just change their position and every time the circuit identity 
is applied we introduce a new gate CNOT(a,£>). However, 
such a gate can always be commuted to the closest on the left 
CNOT(a,i>), and this is accounted for by the updates to the 
i t gate presence indicator. The circuit gets transformed to the 
one illustrated in Fig. |4] Parameters z* are changed through 
XORing each ij, j < - 2 ' with pt, for k < n such that q^ is 
the target of the gate used at step j. The constructed circuit 
consists of three parts marked I-III in Fig. |4] The skeleton of 
each of these parts is described by equation ([T), which is obvi- 
ous for parts II and III and requires a short explanation for part 
I. Divide the skeleton circuit (Fig. [T^) into (n— 1) parts with 
the first containing first (n — 1) gates, the second containing 
next (n — 2) gates, and so on, the last, (n — 1 ) st part containing 
one last gate. Then, gate G, for i = \..n — 1 from part I of 
the circuit in Fig. |4]can be matched (via "skeletonization") to 
some gate in the / th part of the skeleton circuit SC. Thus, ev- 
ery linear reversible function can be computed as a maximal 
depth 3(2/1 — 3) = 6« + 0(l) circuit. Furthermore, since each 
SWAP-CNOT pair can be rewritten as two CNOTs (Fig. EJ 
and SWAP requires no more than 3 CNOT gates, the overall 
depth in terms of CNOTs can be upper bounded by the ex- 
pression 18« + 0(1 ). We note that in some quantum informa- 
tion processing proposals pair CNOT-SWAP can be executed 
more efficiently than a single CNOT or a single SWAP, such 
as in lf23ll . Fig. 1. Due to the locality constraint our upper 
bound has the same asymptotic as a lower bound, and thus 
our circuits are asymptotically optimal. Using H-C-P-C-P-C- 
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FIG. 4: Gauss- Jordan elimination algorithm network with rearranged 
gates. 



FIG. 6: General structure of the (a) encoding and (b) syndrome de- 
tection circuits for CSS quantum error correcting codes. 
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FIG. 5: 2-CNOT circuit equivalent to a SWAP-CNOT pair. 



H-P-C-P-C decomposition for stabilizer circuits [20] these up- 
per bounds directly translate to at most depth 30n + 0(1) cir- 
cuit composed with generic two-qubit gates, or at most depth 
90« + 0(1) circuit in the library with single-qubit and CNOT 
gates. 



1. Encoding and error syndrome circuits for CSS codes 

Encoding and error syndrome circuits for CSS codes are of 
a great practical importance due to the clever error correct- 
ing properties of the CSS codes. Such circuits include those 
illustrated in Fig. |6|a) (encoding; [24]) and Fig. |6|b) (error 
syndrome; [5[]), where single-qubit Hadamard gates are not 
illustrated since their contribution to the total depth is only 
a constant, and the controlled gates, each of which may or 
may not be present (which is defined by the form of the parity 
check matrices of the corresponding classical codes), are ei- 
ther controlled-NOT or controlled-Z. Our circuit paralleliza- 
tion technique described in the previous subsection applies 
directly to such circuits since each of them has skeleton as 
described by the Eq. (HJ with n = s + 1 + 1 for the encoding 
circuit and n = s + 1 for the error syndrome circuit. This al- 
lows us to execute the encoding circuit in (2s + 2t — 1 ) stages 
and the error syndrome circuit in (2s + It — 3) (in both cases, 
s,t > 1) stages composed of generic two-qubit gates. How- 
ever, a better approach is possible. The following construction 
is, essentially, a part of the algorithm used to execute SC. 

Consider encoding circuit (Fig. |6ja)). Prepare the qubits in 
the following LNN connectivity pattern a\ — a% — . . . — a s — 
b — c r — c r _ i — . . . — c\ . At each level i apply gates whose tar- 
gets intersect with the sloping lines marked "level f shown 
in Fig. HJa). Each such level is followed by the level of 
SWAPs applied to the same qubits as the gates from the pre- 
vious level to allow for the next set of gates to get executed 
in the LNN architecture. For example, for 5 = 3 and t = 4 



level 3 will be composed of the gates G(b,C2), G(aj,,Ci), and 
G (02 , C4 ) , followed by the swaps SWAP ip,cz), SWAP (03,03), 
and SWAP(o2,C4). Thus, the total depth of the encoding 
circuit executable in the LNN architecture will be equal to 
(s + 1 + 1 ) if it is allowed to be composed of generic two-qubit 
gates. This is almost half of what was expected if this cir- 
cuit were matched to the SC first. This translates to a depth 
2(i + f + 1) circuit with controlled-NOT, controlled-Z and 
SWAP gates. Similarly, the depth of the error syndrome cir- 
cuit composed with generic gates and executable in the LNN 
architecture is (s + 1 — 1 ). 

Application of the technique described in this subsection to 
executing the error syndrome circuit for Steane's code (Jit], 
Fig. 10.16) in the LNN architecture shows that this can be 
done in 12 stages composed of generic two-qubit gates or 
26(= 2*12 + 2) stages composed of Hadamard, controlled- 
NOT, controlled-Z, and SWAP gates. We can show that the 
encoding circuit of [24], Fig. 8b, can be executed in 23 
stages composed of generic two-qubit gates or, alternatively, 
68 (= 3 * 23 + 1 - 2: pairs CNOT-SWAP must be combined, 
we need an extra level for Hadamard gates, but do not need to 
apply last SWAP) stages composed of CNOT and Hadamard 
gates in the LNN architecture. Our result for the depth, 68, is 
notably better than 177 found by the automated procedure of 



III. LOWER BOUNDS 

In this section we study lower bounds on the depth of skele- 
ton circuit SC defined in equation (Q]i assuming all gates are 
present (i.e., each i* = 1). We further assume that a pair of 
gates G(qi,qj)SWAP(qi,qj) requires two units of the execu- 
tion time, one for each of the gates. In practice, a direct im- 
plementation of pair G(qj,qj)SWAP(qi,qj) may be more ef- 
ficient [6], but the particulars of such a construction depend 
on the specific Hamiltonian, which is unknown in the gen- 
eral case. The depth of circuit illustrated in Fig. [TJc) is thus 
(An — 6). The lower bounds achieved below are directly ap- 
plicable to the QFT circuit. 

To prove lower bounds, we need to restrict the set of possi- 
ble computations. We define two circuit type quantum com- 
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putational models A and B. We require that for each of them 
in order to compute the SC (equation (Q~|l) all two-qubit 
gates need to be executed, and no ancilla qubits may be used. 
Furthermore, 

• in model A we assume that the gates required to be exe- 
cuted in SC cannot be commuted (other than trivially — 
a pair of gates operating on non-intersecting sets of 
qubits always commutes); 

• in model B we allow possibility of the execution of 
gates in any order (i.e., this lets us obtain bounds that al- 
low commuting gates through the circuit, without wor- 
rying about which gates actually commute, and what 
kind of corrections are needed in case they do not com- 
mute). 

The architectures considered in this paper are LAW, 2D square 
lattice, and bounded degree graph with the degree of each ver- 
tex no more than k. We next prove a number of lower bounds, 
refer to Table U 



TABLE I: Lower bounds on the depth of the SC in models A and B in 
the LAW, 2D square lattice, and bounded degree graph architectures. 



LNN 


2D square lattice 


bounded degree graph 


model A -^ + 0(1) 


3n + 0(l) 


(2+|)« + 0(l) 
(1 + |)„ + 0(1) 


models ^ + 0(1) 


£ + 0(1) 



+ 0(1) bound in LNN, model A. First, denote each 
depth- 1 computational stage (logic level) by L and each depth- 
1 swapping stage by S. Every three stages of the SC have a 
single fixed qubit that interacts with three other qubits. This 
is either q\, qi, or q n . Thus, every three logic levels have to 
be separated by a round of SWAPs, each having depth at least 
1, i.e. each sequence LLL must be replaced by LSLL or LLSL 
to be able to run the circuit in the LNN architecture. We call 
this 3L — > IS requirement. With the 3L — > IS requirement, the 
total depth must be at least 2n-3 + fi(2n-5)] =3n + 0(l) 
logic levels. Therefore, using just the 3L — ► IS requirement 
proves that our circuit is at most factor I off the optimum. We 
now improve this bound to +0(1) by showing that every 
4 computational stages must be separated by at least depth-2 
swapping stage (4L — > 2S requirement). 4L — > 2S is slightly 
more restrictive than 3L — > IS. The difference between the 
two is that in one LLSLL is allowed, but not in the other. We 
next prove that depth- 1 level does not suffice in separating 
some two computational stages from the following two by ex- 
ploring the properties of SC and the LNN architecture. 

Assume all 4 computational stages L,-, Li+\ , I4+2, and L/+3 
are solely in the first half of SC. The second half is sym- 
metric to the first half and thus a similar proof holds for it. 
We do not prove the boundary case (where one part of the 4- 
stage computation is in the first half of the SC and the other 
part is in the second half) because its contribution to the final 
figure is only a constant. Next, assume / is odd. The proof 
for even values i is analogous. Name the qubits qi,q2, ■■■ ,q n 
top to bottom. The computational stages L, and L I+ i use 



interactions qi+2 — qi, q\ - q i+ \, q i+ \ - q2,. . . ,qt+i —qi+3, 
which in the LNN architecture can only be aligned as follows: 
<7;+2 — <7i — Qi+i — 02 ~ ■ ■ ■ ~ Q j+i —Qi+s- The computational 
stages L, + 2 and Lj+3 use interactions qi+4 — qi, q\ — qi+3, 
Oj+3 — q2, . . . ,qi+3 —qt+5 . In particular, stages L;+2 and Lj+3 
require interaction q^ — #,-+7, and qubit q^ is used both in 
L{ + 2 and L4+3. However, we know that after completion of 
stages Li and L,+i, the architecture allows interactions in the 
following order qm — q i+ \ —qt+s —qt-i —qui. The LNN 

~~T~ ~~T ~T ~T ~~2~ 

architecture distance between a ,+3 and qi+i is 4. A depth- 
1 swapping reduces the architectural distance between these 
qubits by at most 2, which is not enough for the desired inter- 
action to be allowed. Thus, the depth of swapping must be at 
least 2. This concludes the proof of the 4L — > 2S requirement. 

We finalize the proof of ^ + (9(1) lower bound by 
observing that for a circuit with 2« + 0(l) stages L we 
need to have at least y + 0(1) stages S to satisfy 4L — » 2S 
requirement. Thus, the total number of stages required to 
execute SC in LNN is + 0(1). This implies that the circuit 
we constructed explicitly (Fig. [TJ C )) must be within factor of 
I from optimum. 

3n + 0(1) lower bound in 2D square lattice, model A. We 

prove that every three computational stages L,_2 , L,- 1 , and L,-, 
where i = 2k+ 1 and k = 1 . . f 2 ^] (this means that all com- 
putational stages are in the first part of SC; the proof for the 
symmetric second part is similar) must contain at least one 
swapping stage if ran in 2D square lattice architecture. We 
prove this by finding three interactions that form a loop. Ver- 
tices in such loop cannot be isomorphically mapped to the ver- 
tices of 2D square lattice. The interactions that form such a 
loop, assuming qubits are named q\,q2, ••• ,?n top to bottom, 
are qi-\ — om in L,_2, qi-i — <?;+3 in and i —qt+3 

T~ T~ T~ T~ T~ ~~T~ 

in Lj. This proves that for every possible value k it is required 
to have at least one swapping stage, which results in the con- 
struction of 3n + 0(1) lower bound. 

The lower bound that we just proved may be interesting 
to those experimentalists working on implementing 2D 
architectures for quantum information processing. The lower 
bound shows that, with certain restrictions, the QFT in 2D 
square lattices cannot in principle be parallelized any more 
efficiently than to a depth at least | of the depth of QFT 
circuit executable in the LNN architecture. 

^ +0(1) lower bound in NCT, model B. Recall that the 

number of gates in SC is n \ n ~ l ) and they all require different 
qubit-to-qubit interactions to be available. Next, note that in 
the LNN architecture application of a single SWAP may make 
at most two new interactions become available for a gate to be 
applied on. Thus, the total number of SWAPs that one must 
execute in a circuit to go through all "^" 2 - possible interac- 
tions is at least [ 2 2 ^" ~1 = [ ^"~ 1 ^" -2 ^ ] . This means that 
the total number of gates to be executed in the LNN architec- 
ture to compute SC must be at least "^" 2 - + |" (" l )( n 2 ) "| = 
|" — — "|. At most LjJ gates can be executed in paral- 
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lei. Thus, the depth of the circuit is at least the minimum 
total number of gates to be executed divided by the maxi- 
mum number of gates that can be executed simultaneously, 
i.e. ^+0(1). 

This lower bound is constructed based on the assumption 
that all gates in SC need to be executed, and does not take into 
account that the order they are executed in is important. Thus, 
the restriction on the form of the computation is significantly 
weaker than that for model A, and the proven lower bound is 
looser. 

Generalizing the above techniques, it can be shown that in 
an architecture where each qubit has a finite number of neigh- 
bors bounded by number k: 

• the lower bound for executing SC is (2 + |)n + 0(1) in 
model A; 

• the lower bound for executing SC is (1 + l)n + 0(1) in 
model B. 

The ^ + 0(1) lower bound announced in Table U follows 
from the second of these two statements. Given the linearity 
of proven lower and upper bounds, we have just shown the 
asymptotic optimality of the depth of our skeleton circuit in 
the restricted architectures considered in this paper. 



the application of our generalized technique we showed how 
the approximate QFT circuit can be executed in linear depth in 
the LNN architecture. We proved a number of lower bounds 
for the depth of QFT circuit, which are all a constant factor 
away (ranging from ^ to |, and depending on the computa- 
tional model and assumptions made) from the above upper 
bound. Some of our lower bounds can be used by experimen- 
talists working on implementing advanced architectures as a 
guide to how complex architectures may need to be for partic- 
ular types of computations. For instance, we proved that, with 
certain restrictions, the QFT circuit in 2D square lattices can- 
not in principle be parallelized more than to the depth equal to 
| of the depth of QFT circuit executable in the LNN architec- 
ture. 

More importantly, we presented a constructive algorithm 
for synthesizing linear depth stabilizer circuits in the LNN ar- 
chitecture. In particular, we showed that any stabilizer circuit 
can be executed in at most 30n + 0(1) stages each composed 
with generic two-qubit gates, which in the library with CNOT 
and single-qubit gates translates to at most depth 90n + 0(1) 
circuit. This upper bound is asymptotically optimal. We 
considered specific stabilizer circuits and showed how these 
circuits can be executed faster than reported by previous re- 
searchers B24I1 . 
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